An Improved Naive Bayes Text Classification Algorithm In Chinese Information Processing

نویسنده

  • Lingling Yuan
چکیده

In Chinese information processing, Naive Bayes is a simple text classification method that is easily implemented. Its core is the realization of the calculating posterior probability algorithm and the effectively reducing dimension for feature words. This paper improved Naive Bayes text classification from the calculating posterior probability and the reducing dimension of feature words of text. The result of experiment indicated that the improved method is of the higher efficiency than the original algorithm.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Approach for Text Documents Classification with Invasive Weed Optimization and Naive Bayes Classifier

With the fast increase of the documents, using Text Document Classification (TDC) methods has become a crucial matter. This paper presented a hybrid model of Invasive Weed Optimization (IWO) and Naive Bayes (NB) classifier (IWO-NB) for Feature Selection (FS) in order to reduce the big size of features space in TDC. TDC includes different actions such as text processing, feature extraction, form...

متن کامل

Review Paper on Sentiment Analysis of Twitter Data Using Text Mining and Hybrid Classification Approach

In Sentiment analysis we use natural language processing and information to extracting writer’s comments or reviews. In this paper we use Data text mining and hybrid approach of KNN Algorithm and Naïve Bayes Algorithm to find the sentiments of Indian people on Tweeter.

متن کامل

Iwona Żak * Marcin Ciura Automatic Text Categorisation

The paper presents a module for classifying Polish text, intended for use in an automatic processing of job advertisements. Two classifying algorithms are implemented: a naive Bayes classifier and TFIDF algorithm. Stop lists and stemming are used to improve the processing efficiency.

متن کامل

Implementation and Evaluation of Scalable Approaches for Automatic Chinese Text Categorization

The purpose of this research is to identify scalable approaches that can handle large amount of training data such as several years of news articles, and automatically assign predefined category to Chinese free text documents. Our approach consists of the following processes: (i) term extraction, (ii) term selection, and (iii) document classification. The approach first builds a recently develo...

متن کامل

The study on the spam filtering technology based on Bayesian algorithm

This paper analyzed spam filtering technology, carried out a detailed study of Naive Bayes algorithm, and proposed the improved Naive Bayesian mail filtering technology. Improvement can be seen in text selection as well as feature extraction. The general Bayesian text classification algorithm mostly takes information gain and cross-entropy algorithm in feature selection. Through the principle o...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010